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\ A unified approach to prove the converses for the quantum channel capacity theorems is presented. 

These converses include the strong converse theorems for classical or quantum information transfer 
with error exponents and novel explicit upper bounds on the fidelity measures reminiscent of the 
Wolfowitz strong converse for the classical channel capacity theorems. We provide a new proof 
for the error exponents for the classical information transfer. A long standing problem in quantum 
information theory has been to find out the strong converse for the channel capacity theorem when 
quantum information is sent across the channel. We give the quantum error exponent thereby giving 
■ a one-shot exponential upper bound on the fidelity. We then apply our results to show that the 

strong converse holds for the quantum information transfer across an erasure channel for maximally 
entangled channel inputs. 
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One of the holy grails of information theory has been to prove the information-carrying capacities of 
C*~) . various channels [1]. The capacity identifies the maximum rate (measured as number of bits/qubits per channel 

^ | use) with which one could transfer information reliably across the channel in the limit of sufficiently large 

number of channel uses 

C*** . The capacity, for certain channels, also told us an interesting property about the fidelity between the 

message at the source and the replicated message at the receiver and this interesting property is that the fidelity 
could be made (with appropriate inputs) to go to 1 (i.e., a completely reliable transfer) for rates below capacity 
. and goes to (i.e., a completely unreliable transfer) for rates above capacity for any input for sufficiently large 

number of channel uses. 

Such converse theorems where the fidelity goes to for sufficiently large number of channel uses for rates 
above capacity are referred to as the strong converses. For certain channels, one could show that the fidelity 
would decay exponentially to zero as the number of channel uses increases for rates above capacity. An 
example of a channel with no strong converse is given in Ref. 

A strong converse for the classical discrete memoryless channel (DMC) was given by Wolfowitz |@). In a 
simpler form, it showed that for rates above capacity, 1 — P e (P e denotes the probability of decoding error) 
can be bounded from above by two terms: one that decays as 1/n and the other that decays exponentially with 
n, where n is the number of channel uses. 

Arimoto provided a different strong converse with P e — > 1 exponentially with n using the error exponents 



1 10I1 . These error exponents were known from the work of Gallager who used them to give an upper bound to 



show P e — > exponentially for rates below capacity 111 ill . 
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An important problem in quantum information theory has been to find the capacity of a quantum channel 
for classical information transfer Q12Nl4|]. Winter provided a strong converse which guarantees that for rates 
above capacity P e — > 1 as n — > oo Ogawa and Nagaoka gave an Arimoto-like strong converse where 
they showed that P e — > 1 exponentially with n Biol . 

The channel inputs used in the strong converse theorems mentioned above were unentangled across chan- 
nel uses. In a fully general channel converse, such a restriction would not be made. Konig and Wehner provide 
a strong converse for entangled inputs for a subclass of channels for which a single-letter formula for capacity 
is available 11711 . M ore strong converse theorems not necessarily in the context of channel capacity could be 
found in RefklSS]. 



Polyanskiy, Poor & Verdu and Polyanskiy & Verdu (see Refs. B23L 12411 ) provided a unified converse for 
the classical channel capacity theorem and such a converse yields among others the Arimoto converse (Ref. 
110]), Wolfowitz converse (Ref. |@]) and the Fano inequality (Ref. |0]). 

One of their building blocks has been the use of the monotonicity (or data processing inequality) of the 
divergences in the unified converse. It is interesting to note that a similar approach was followed by Blahut in 
giving an alternate proof of the Fano inequality in 1976 Il25ll . This technique was also used by Han and Verdu 
to generalize the Fano inequality |26J]. 

Instead of relative entropy employed by Blahut, Polyanskiy-Poor- Verdu used generalized divergences that 
satisfied the monotonicity and other properties. The approach translated the promise of a communication 
protocol (or a code) of delivering a rate under some fidelity constraints to an upper bound on the fidelity. They 
also used some derived quantities defined by Csiszar in Ref. (271], who gave their operational characterizations 
in terms of block coding and hypothesis testing and related it to the Gallager's error exponent. These quantities 
play a critical role in the strong converse theorems. 

We note that Csiszar's approach in Ref. Q27J] was generalized to the quantum domain by Mosonyi and Hiai 
in Ref. (2811 to provide an operational interpretation of the quantum a-relative entropies, but there has been 
no connection made between the Csiszar's quantities and the strong converse theorems. 

A related work has been the one done by the first author of using monotonicity for proving the generalized 
quantum Fano inequality in Ref. (29]]. Fano inequality is widely used in the converse (not strong) channel 
capacity theorem proofs. Note that classical Fano inequality has no special relation with the quantum Fano 
inequality (the former not being a special case of latter) and the technique to generalize the quantum Fano 
inequality is inherently 'quantum' and perhaps the only common thread between the quantum and the classical 
proof (the latter dating back to Blahut's work) has been the use of monotonicity. 

Our approach in this paper has been to cany this common thread of using monotonicity further and to 
provide a unified approach to strong converses such as the quantum generalization of the Arimoto's and 
Wolfowitz 's with explicit bounds for the latter. We note that no quantum version of Wolfowitz-like bound 
was known. We build on the above mentioned works in the classical and quantum domains to first list the 
properties of generalized quantum divergences that we shall need for our proofs. In particular, we show that 
the quantum Renyi divergences and a non-commutative hockey-stick divergence, that we define, satisfy these 
properties and suffice to give Arimoto-like bounds with error exponents and also Wolfowitz-like bounds. 

We then apply our approach to two quantum channel capacity theorems namely sending classical infor- 
mation across a quantum channel and sending quantum information across a quantum channel. We note that 
the strong converse for the latter problem has been an open problem for quite sometime. 

The organization of the paper is as follows. In Section J]] we list the properties we desire from the gener- 
alized divergences that can be leveraged for the strong converse theorems. We then derive quantities based on 
these divergences similar to Csiszar in Ref. 112711 . 
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In Section |nll we first define the information-processing task for sending the classical information across 
a quantum channel and then prove a converse for the generalized divergences. We then take specific examples 
of divergences to yield the two converses - one coinciding with the Ogawa and Nagaoka converse but with an 
alternate proof and second which is Wolfowitz-like. 

In Sec. UVj we repeat the above for sending the quantum information across a quantum channel. The results 
are quantum error exponent reminiscent of Gallager/Arimoto exponent and then Wolfowitz-like bounds. We 
give sufficient conditions for the strong converse to hold in general. Lastly, we provide a strong converse for 
the quantum erasure channel for maximally entangled channel inputs. 

The proofs of many Lemmas are given in the Appendix to make the reading of the paper easier. We use 
the following notation throughout the paper. All the logarithms are natural logarithms. We shall assume that 
all the quantum systems are finite dimensional. For a given Hilbert space Ha describing quantum system A, 
let S{T~La) denote the set of all density matrices of Ha and let \A\ be the dimension of Ha- 1 indicates the 
identity matrix whose dimensions would be clear from the context. For a given square matrix p and a scalar 
x, p + x is supposed to mean p + xt. A quantum operation is a completely positive and trace preserving 
(CPTP) map and we use quantum operation, quantum channel, and CPTP map synonymously. 

The von Neumann entropy of a quantum state p in system A is denoted by H{A) p and if a AB is a bipartite 
state in AB, then the quantum mutual information is given by 

I(A; B) a := H(A) a + H(B) a - H(A, B) a . (1) 

The coherent information of a AB is given by 

I{A)B) a := H(B) a - H(A, B) a . (2) 

The projector P{ p _ o .> } is a projector onto the positive part of p — a. For a pure state \<fi), we denote | </>)(</> | 
by (j). The fidelity between a pure and a mixed state is defined as F(\(f>},p) = {(f)\p\(j)). 



II. GENERALIZED DIVERGENCES 

Let us denote a generalized divergence for positive matrices from p to a by V(p\\a) that satisfies the 
following properties: 

1. T>(p\ \a) satisfies the monotonicity property (or the data processing inequality), i.e., for any CPTP map 
£ , we have 

V(p\\a)>V[£(p)\\£(a)}. (3) 

2. For any quantum state k, 

V(p® k\\(t® k) =V{p\\a). (4) 

3. Let n = 1 0> <0| and 111 = |1)(1| be two projectors with n + IIi = 1. 

For a, P € [0, 1], let p = (1 - a)U + alii, a = /3U + (1 - /3)ITi, and let us define 

d^(l -a\\/3) :=V{p\\a). (5) 
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ThendW(l — a 1 is independent of the choice of {ITo, IIi} and decreasing for all a < 1 — /3. 
Let a G [0, 1], /3 G (0, 1], p = aU + (1 - a)IIi, a = /3n + (1//3 - /3)n 1? and let us define 

d^(a||/3) :=P(Hk). (6) 

Note that (T > but does not have unit trace. Then d^(a||/3) is independent of the choice of {IIo, IIi} 
and increasing for all a > /3. 

For our purposes, it is not necessary that all these properties are satisfied by a chosen divergence V. 
Nevertheless, we give some examples below that satisfy all the above properties. 

Some Renyi divergences: For p, a > 0, the Renyi divergence from p to a of order a, a G [0, 2]\{1}, is 
defined as 

i^(p|k) = — ^logTrpV^ (7) 
a — 1 



and limit is taken at a = 1. The monotonicity property is proved in Ref. [30] (see Example 4.5) and the other 
properties are not difficult to show. 



Non-commutative hockey-stick divergence: It has been shown in Refs. 11231 . 12411 that the classical Wolfowitz 
converse giving explicit bounds can be obtained using the /-divergence with 

/(x) = (x- 7 ) + , (8) 

where x + = x if x > and otherwise. This function is known as the hockey-stick function and has 
applications in finance lETll . It might be tempting to define a quantum /-divergence (see Refs. 



using f(x) but f(x) is not operator convex [34] and operator convexity is typically used for proving the 
monotonicity property. However, there is a workaround. Let the Jordan decomposition of a square matrix k 
be given by k = k + — k~, where k + and k~ are the positive and the negative parts of k. Then we could 
define the non-commutative hockey-stick divergence (or simply hockey-stick divergence) as 

V{p\\a) = Tx(p- 1 a) + , (9) 

where 7 > 1. Note that this is not a /-divergence in the sense of H |32L|33|]. In fact, it is related to the trace 



distance since 2(x — 7) + = \x — 7) + (x — 7), and hence, for quantum states p and a, we have 

2V(p\\a) =Tr|p-7<7| + Tr(p - 7a) (10) 

= ||/>-7*||i + (l-7)- (11) 
The monotonicity follows from Lemma|4]in the Appendix and the other properties are not difficult to show. 

A. Derived quantities 

We now define two quantities for bipartite states p AB G S(Ha ®T~Lb) derived from the generalized 
divergence as 

K {c \A-B) p := inf T>(p AB \\p A ® a B ), (12) 
K®(A;B) P := inf V(p AB \\t <g> a B ), (13) 



where p A = TtbP AB ■ We now have the following lemma that shows that both the above derived quantities 
satisfy the data processing inequality. 

Lemma 1. Let E B ^ C be a CPTP map and p AC = £ B ^ c (p AB ). Then 

l6 c \A;B) p >l6 c \A;C) p , (14) 
K^{A;B) p >K^{A;C) p . (15) 

Let p ABC = p AB <g> p c . Then 

IC^(A;B) p >IC^(A;BC) p . (16) 
Proof. For any 5 > 0, there exists a a such that JC^(A; B) p > V(p AB \\p A ® a B ) - 5. We now have 

K^{A-B) p >V{p AB \\p A ®a B )-5 (17) 
>V[p AC \\p A ®£ B ^ c {a B )}-5 (18) 
> miV{p AC \\p A ®a c ) -5 (19) 

= K, i - c \A-C) p -5. (20) 

Since this is true for any 5 > 0, the result follows. The proof of (fT5l is similar and we omit it. To show (fT6l ). 
note that 

)C {C) {A;B) P = mfV(p AB \\p A ®a B ) (21) 

= inf V(p AB ® p c \\p A ®a B ® p c ) (22) 

= -miV(p ABC \\p A ®o B ® p c ) (23) 

> inf V{p ABC \\p A ®a BC ) (24) 

= IC {C \A-BC) P . (25) 

QED. □ 

Note that the above definition of K,^ can be easily extended for the classical random variables by assuming 
that the density matrices are commuting and the random variables have probability distributions given by the 
eigenvalues. This definition would be the same as the one in Ref. (24] given by 

l6 c \X;Y)= inf V(P XY \\P X x Q Y ), (26) 
Qy^Py 

where Pxy is the joint probability distribution of the pair of random variables (X, Y), Px is the probability 
distribution of X that can be deduced from Pxy, and Vy is the set of all probability distributions that Y can 
take. The following result will be useful later. 



Lemma 2 (Polyanskiy and Verdii, 2010 Il24ll ). Let the random variables S, X,Y, S form a Markov chain 
S-X-Y-S. Then 

/C (c) (5;5) < K {c) {X-Y). (27) 
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III. CLASSICAL INFORMATION OVER QUANTUM CHANNEL 
A. Information processing task and converse 

For a given message source and a communication channel, a communication protocol consists of an en- 
coder and decoder entrusted with the task of replicating the message at the receiver within some prescribed 
error. 

Suppose Alice wants to send classical information to Bob using a quantum channel. We model the in- 
formation as a uniformly distributed random variable S that takes values over the set {l,2,...,e n ^}. Alice 
maps S to X using, possibly, a randomized encoder modeled by the conditional probability distribution Px\s-> 
where X takes values over {1, 2, \X\}. The encoder's output is then mapped to pj[ n G S{T~LA ln ) an d is 
sent to Bob over n independent uses of the channel Af A ^ B . It is useful to represent the state of input to the 
channel as a cq (classical-quantum) state given by 

p MA- = ^p x{X )\ x){x \M® P r. (28) 

X 

Bob receives his part, B n , of 

where j\f A ' n ^ Bn = ^J\f A '^ B ^j , and to find out the classical message that Alice sent for him, Bob applies 

a POVM {A^™}, y £ {1, 2, |3^|}, and the outcome of the measurement process is modeled by a random 
variable Y where 

P Y \x{y\x) = Pr{y = y\X = x} = TrAf (A/^ B )%f\ (30) 

The random variable Y is then further processed (decoded) by Bob to yield S as an estimate of the message 
S. The average probability of error is given by Pr{5 / §}. 

If the above communication protocol achieves an average probability of error not larger than e, then we 
shall refer to such a protocol as a (n, TZ, s) code. 

We first prove an inequality similar to the Holevo bound for K.^. 

Lemma 3 (Holevo-like bound for K,^). For any n and any POVM {A^™}, we have 

K {c) {X-Y)<K {c) {M-B n ) p . (31) 

Proof. We prove it for n = 1 and the extension to any n is straightforward. Consider an ancilla quantum 
system C that is uncorrected with the system MB and the joint state of MBC is given by 

p MBC = J2Px(x)\x)(x\ M ® M A '^ B {pi) ® |l)(l| c , (32) 

where i = 1,2, \C\ is an orthonormal basis in T-Lq- Let £ BC ^ B ' C ' be a CPTP map with Krauss 

operators {y/A^ <8> U y }, y = 1, 2, |^|, where U y is a Unitary matrix such that U y \l) c = \y) . The state 
after applying the map £ BC ^ B c is given by 

p mb>c = y^p x { x )\ x ){x\ m ® ^K-yM^^ipi)^® \v)(y\ c - (33) 



We now have the following inequalities 

K,^ (M; B) p > (M ; BC) P (34) 

>K,^{M-B'C')p (35) 

>/C( c )(M;C") p , (36) 

where a follows from (fT6l ). Note that 

P MC ' = |x)(x| M 8) \y){y\ c . (37) 

Let Ily = \y)(y\ c and let us define a quantum operation J 7 on the system C with Krauss operators {11^}. 
Since T c ^ c (p ) = p , hence, using the data processing inequality, for any a c € S(%c')> we g et 



V{p MC '\\p M ®a°) >V 



p MC '\\p M ® F c '^ c "(a c ' 



(38) 



This indicates that for the minimization, one may consider only those a c that have { \y) c } as the eigenvectors 
which would lead us to the classical divergence in (|26T ) and hence, 



K, {c) {M-C') p = /C (c) (X ; y). (39) 
QED. □ 

We now prove a theorem that would allow us to yield the various converses. 
Theorem 1. For e < 1 — e _n7 ^, any (n, 1Z, e) code satisfies 

d (c) (1 - e\ \e~ nn ) < /C (c) (M; B n ) p . (40) 
Proof. We have the following inequalities 

£(<0 ( M; > ^(c) ( X . y ) (41) 

>/C (c) (<S;S) (42) 

> d (c) (Pr{S = S}\\e- niz ) (43) 

>d (c) (l-e||e- n ^), (44) 

where a follows from Lemma [3) b follows from Lemma |2j c follows by applying the classical transformation 
(S, S) — > 5 S where 5 Xjy = 1 if x = y and otherwise, and d follows from Property 3 in Sec. HTIpertaining 

tod^. □ 

It may be worth mentioning that the constraint e < 1 — e~ nn may not be seen as weakening the strong 
converse because, if the constraint is violated, i.e., e > 1 — e~ n7 ^, then it, by itself, would imply an exponential 
convergence of e to 1, where 1Z is bounded from below (since we are proving the converse) by the channel 
capacity. 

We are now in a position to apply Theorem [T]to yield the various converses. 



B. New proof of the Ogawa and Nagaoka converse 



We assume that n = 1 which is clearly the most general case. Take V to be D\, the Renyi divergence of 
order A G [0, 2] \{1}, in Theorem[]]to get 



d (c) fl 



X inf ^(Ol/^O- 
<x B e<S(« B ) 



For a cq-state we note from Ref. II17H that 



D x (p MB \\p M ®a B ) = D x {p 



MB 1 1 M 



a*) + D x (a*\\a B ), 



where 



tB 



a 



Tr£ 



B ■ 



with C B = \ ^2Px(x)p B 



Hence, the minimum of the RHS of d45l ) is achieved at a B = a*. Substituting in d45l ), we get 



d (c) (l 



where 



logTr 



V(«+i) 



s+1 



(45) 



(46) 



(47) 



(48) 



(49) 



Using the inequality 



d (c) (1 _ e | | e -W) > io g (i _ e ) + n, 

A — 1 



(50) 



we get for A G (1, 2], s = A -1 - 1 and hence, s G [-1/2, 0) that 

e> l-exp{- [-S-R + Eo(s,N A '^ B ) {Pxix)jp Ai } ]} , (51) 

The rest of the treatment is the same as in Ref. Jlill . For a (n, 1Z, e) code, let Alice send unentangled inputs 
across the channel uses, i.e., the ensemble across the n channel uses is given by 



M A ^ Bi (p%)\, a* 6 {1,2,..., |*|}, Vi. 



(52) 



.i=i 



i=l 



Theorem 2. For a (n, 7Z, e) code and for all n G N with unentangled inputs, the following lower bound holds 



e > 1 — exp < — n 



-sK + E (s,N A '^ B ) 



{P x (x),p£'} 



}■ 



(53) 



It is also shown in Ref . Jl^l and not too difficult to check that 

dE {s,M A '^ B ) {Px[x) ^ } 



ds 



= I(M;B) p , (54) 

s=0 



and if TZ > C^(J\f) ft35[], where 



C (1) (A/")= max I(M;B) P , (55) 

then 3 t G [-1/2,0) such that Vs G (-t,0), 

-sft + £ (s,A^ B ) {Px(;e))/9 a' } > 0. (56) 



Hence, it implies using (1531) that the probability of error approaches 1 exponentially. 

Note that we had to confine s in [—1/2, 0) instead of [—1,0) since the monotonicity of quantum Renyi 



divergence of order A is known to hold for A G [0,2] |28|, |30j]. This does not, however, affect the strong 



converse proof since the Lemma 3 in Ref. Ill6ll would still hold. 



C. Wolfowitz converse 

Again, we first assume n = 1 before going to any n. Take V to be the hockey-stick divergence. We first 
note that 

d (c) (l-e||e-^) = (l-e- 7 e- 7e ) + + [e - j{1 - e~ n )} + (57) 
> 1 - e - 7e"^ (58) 

and hence, using Theorem [T] we have 

e> l-/C (c) (Af ;J B) p - 7 e~^. (59) 

Note that 

/C (C) (M; B) p < TvP {p M B ^ p M^ pB>0}P MB . (60) 

We now give an upper bound for the RHS of the above equation that is reminiscent of the Chebyshev's 
inequality in the classical setting. Using Lemma|6l we get for log 7 > I(M; B) p , 

Tp mb , ^P MB [l°g P MB ~ ^g( P M ®P B )] 2 - [ J(M; B) p f 
TrP {pA1B ^ pM , jpB>0}P < [log 7 -J(M ;jB)p ] 2 ' (61) 

Define for any n G N, 

A® := max fir/**" [logp MB " - log(p M ® p Bn f - [I (M ; B n ) p f\ . (62) 
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This quantity (without the maximization over the channel input) has been known in the classical case as the 
information variance and was defined by Shannon (see Ref. 112311 for more details). The finiteness of A\ 
follows from Lemma[9] Using Theorem [Q (I58T ) and (|6TT >. we get 



[log 7 _ I(M; B) 



A y 



For a (n,1Z,e) code and unentangled inputs described in (l52l . it is not difficult to show that A$ = nWf\ 
Then choosing log 7 = nC^ (J\f) + n5 for some 5 > 0, where C*- 1 -* (M) is defined in (I55T ). we get for this 
(re, 1Z, e) code 

£ > 1 _^L_ e -n[K-CV W -5\_ (64) 
nd z 

Choosing 5 = [TZ - C (1) (A/")]/2, we get the following result. 

Theorem 3. For a (n,lZ, e) code with the ensemble given in (1521 ). the following lower bound holds 

( c ) 

e > I Ml -n.[7e-CW(AA)]/2 f65 x 

- re[^-C«(A0] 2 ' 
Note that d65l ) has the same form as the classical Wolfowitz strong converse (see Ref. 10]). 



IV. QUANTUM INFORMATION OVER QUANTUM CHANNEL 
A. Information processing task and converse 

Suppose a quantum system S and a reference system A have a state \4>) AS . Alice only has access to the 
system S and not to A. Alice wants to send her part of the shared state with A to Bob using n independent 
uses of a quantum channel M A such that at the end of the communication protocol chain, Bob's shared 
state with the reference A is arbitrarily close to the state Alice shared with A. We shall call TZ to be the 
communication rate and is given by 

K: =!^. ,66) 

n 

We shall assume that the state of S is given by e _n7 ^l, i.e., a completely mixed state. 
To this end, Alice performs an encoding operation given by £ S ^ A ™ to get 

p AA ' n = S s ^ A ' n (4> AS ) • (67) 
Alice transmits the system A' n over j\f A ' n ^ Bn = ^J\f A '^ B ^j and Bob receives the state 

/ B " = M A ' n ^ Bn \e s ^ A ' n u AS )] . (68) 
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Bob applies a decoding operation on its part of the received state to get 

p AS = T B^S j^A^B" ^A- ^AS)j | m 

The performance of the protocol is quantified by the fidelity given by 

F(0 A5 , p A§ ) = {<P\ A V^I0) A5 - (70) 

If we are given that the protocol achieves a fidelity not smaller than F, then we shall refer to such a protocol 
as a (n, 1Z, 1 — F) code. 

The maximum rate per channel use for this protocol in the limit of large number of channel uses and 
fidelity arbitrarily close to 1 was proved in a series of papers IB6M43I1 . Let the coherent information of the 
channel N A be defined as 

Q(Af):=maxI(A)B) a , (71) 

p AA' 

where a AB = M A '^ B {p AA '). The capacity of the channel is now given by the regularization 

Q rcg (A0 := lim Q(A ^" ) . (72) 

n— >oo n 

We now prove a theorem that would give us one-shot inequalities between the fidelity and the rate. 

Theorem 4. For F > e _n7 ^, any (n,1Z,l — ¥) code satisfies 

d^(¥\\e- n1z ) < IC iq \A;B n ) p . (73) 

Proof. Let {li)" 45 } be an orfhonormal basis for Has with \1) AS = \4>) AS ■ Consider a CPTP quantum map 
jrAS^C where ^1 = 2 W ith Krauss operators \Q) C (1\ AS , and {\l) c {i\ AS }, i = 2,3, \AS\. Let n£ = C 
and Ifp = l c . Then we have 

F(p A§ ) = F'n^ + (1 - F')nf , (74) 
F(l ® a S ) = e- nK Il$ + {e nn - e" n ^)nf , (75) 

where F' = (4>\ AS p AS \(j)) AS and (I75T ) holds for all quantum states a s . We now have the following inequalities 

K^(A;B n ) p >K.^(A;S) p (76) 
= mfV(p A§ \\t(g)o- § ) (77) 

> infP [F'n^ + (1 - F')nf ||e" n ^n^ + (e n1z - e" n ^)nf] (78) 



d (g)(F'||e-^) (79) 



d 



> d {q) (¥\\e- nn ), (80) 

where a and b follow from the data processing inequality, c follows since the quantity <d( q \F'\\e~ n ' R ) is 
independent of a s , and d from Property 3 in Sec. [II]pertaining to d^. □ 

We now give upper bounds to the fidelity using the Renyi and the hockey-stick divergences. 
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B. Quantum Error exponent 



Let us first assume that n = 1. Take V to be the Renyi divergence of order A G [0, 2]\{1}. We first note 



that 



d (g)( F || e -^ > 



A - 1 



log F + K 



and it follows from Lemma [7J that 

IC iq) (A;B) 

where for s = A -1 — 1, s G [-1/2, 0), 



1-A 



^(A" 1 - l,M A '^ B ) 



pi 



E (s,M A '^ B ) p := - logTr \ Tr A M A ^ B {p AA ') 



rA'—>-B/ „AA' ^ 



1/(1+5)1 8+1 



(81) 



(82) 



(83) 



whose properties are studied by the following theorem. 
Theorem 5. For any quantum state a , the function 

_aeaV(i+s) iS+1 



satisfies 



g(s) := -logTr [Tr A {a AB y 

9(0) = 0, 



, s€ [-1/2,0), 



dg(s) 



ds 



s=0 



I{A)B\ 



and g(s) + (s + 1) log |^4| is an increasing function in s. 
Proof. See appendix. 

Using d73l . we get the one-shot bound on the fidelity as 

F < exp 



(84) 

(85) 
(86) 

□ 



-sn + Eo{8,tf A '^ B ) p ]}. (87) 
One could provide a sufficient condition for the strong converse to exist as an additivity question. First define 

E%(s,Af) :=mmE (s,M) p , (88) 

p AA' 

where we just abbreviate J\f for J\f A '^ B . Then one could make the following statement. For 1Z > Q re g(AA), 
strong converse holds for all inputs if for all m, n G N 3 t G [— i, 0) such that V s G (t, 0), 

E* (s,AA® n+m ) = E* (s,A/"® n ) + E* (s,N® m ) . (89) 

This statement is easy to prove. Using (I87T ). we have 

E* (s,N® n ) p 



F < exp < — n 



-sK + 



n 



(90) 
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If ([89]) is satisfied, then V s G (t, 0), 



^ (S '^ = £o>,A/V (91) 



7? 



It follows from Lemma [8] that 3 t' G [-5,0) such that V s G (t',0), -s72 + E£(s,N) p > 0. Hence V 
s G (max{i, i'}, 0) and V n, 

n 

and is independent of n. d89l is unlikely to hold in general. Observe that if the dimension of the quantum 
system A is not constraining, then (l89l implies the additivity of the coherent information of the channel. To 
see this, divide d89l ) by s and take the limit s j 0, invoke Theorem [5]and since s < 0, the minimum would be 
replaced by maximum over the input states. It would be interesting to find out if there is a class of channels 
for which d89]> holds. 



C. Wolfowitz converse 

Take V to be the hockey-stick divergence. Following the same steps as in llll CI we get for n = 1 

d (?) (F||e~ w ) > F — 7e~ w , (93) 
K®(A; B) p < TrP {pA B^ mpB>0}P AB . (94) 

We upper bound the RHS of the above equation using Lemma[6l for log 7 > I{A)B) p as 

AB. Trp^ [log^ - log(l g p B )] 2 - [I(A)B) p ] 2 
TrP {pAB .^ pB>0}P < , (95) 



we get 



A iq) 

F < 1 n + ye~ n , (96) 

" [\o gl -I(A)B) p f 



where 



A^ = max I Trp ABn [log/ B " - log(l ® p Bn )f - [I(A)B n ) p ] 2 } . (97) 

p AA ,n I 1 J J 

For a (n, 72., 1 — F) code, choosing log 7 = nQ reg (Af) + n5 for some 5 > 0, we get an upper bound 

A [q) 

F ^ + ex P - Qrcg(AA) - 6}} . (98) 



Choosing 6 = [TZ - Q ies (Af)]/2, we get 



F < 



44^ 



+ exp{-^[7e-Q reg (AA)]}. (99) 



n 2 [7e-Q rcg (AA)] 2 

Hence, a sufficient condition for the strong converse to hold is that An /n 2 — > as n — > 00. 
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D. Strong converse for the quantum erasure channel for maximally entangled channel inputs 



A quantum erasure channel M A '^ B , defined in Ref. 14411 . is given by the following Krauss operators 

<jy(l -p)\i) B {i\ A ',^p\e) B {i\ A '},i = 1, \A'\, p G [0, 1], \B\ = \A'\ + 1, [\i) A '} ,{\i) B } areorthonor- 

mal basis in %A' and %b respectively, and \e) B = \j) B f or j = \B\. The action of the channel can be 
understood as follows 

K" B (P AA ') = ^-P)° AB + PP A ® |e)(e| B . (100) 

Let a AB = G A '^ B (p AA '), where Q increases the dimension but leaves the state intact. Then with probability 
1 — p, the channel leaves the state as a AB and with probability p it erases the state and replaces by \e) B . It is 
not difficult to see that a AB is orthogonal to p A ® |e)(e| s . 

One could carry over this observation for 2 channel uses. Let a AB ^ = (Q A '^ B )® 2 (p AA>2 ). Observe that 

M A '^ Bl ®N A '^ B2 ) (p AA ' lA ' 2 ) = {l-pfa AB ^ + p(l-p)a ABl ® |e)(e| B2 (101) 



+ p{\ -p)a AB2 ® |e)(e| Bl + p 2 p A eg) |e)(e| Bl ® \e){ 



B 2 



where we use the usual notation for the reduced density matrices, i.e., for example, representing a ABl as the 
result of partial trace over B2 of a AB , each of these four matrices are orthogonal to each other and we have 
an abuse of notation in the third term by rearranging the order of the systems. 

Taking this further for n channel uses, let a AB " = (Q ~^ B ')® n (p AA "y The output can be written as the 
sum of 2 n orthogonal density matrices where each of these matrices results from i erasures i G {0, n} and 
this occurs with probability (1 — p) n ~ l p' t . The number of states that have suffered exactly i erasures is ("). 

Let Bi x Bi 2 ■ ■ ■ Bi n _ k be the quantum systems that have not suffered erasures and we could write the state 
m this case using a as 

k 

AB n B t2 -B ln = a A Bil B i2 -B in _ k ^ \ e ) (ef'n-k+i . (102) 



It now follows that 



where 



p =2^ a ^ n x ^i,-X-k ' (103) 

2 n tcrms 



a Kn = (l-p) n - k p k . (104) 

To prove the strong converse, we find an upper bound for K®(A;B n ). We assume that p AA ' n is a maximally 
entangled state with a Schmidt rank of \A\. Let the channel input be cIa dimensional and for n channel uses, 
I .A I = d\. It is known that Q(M) = (1 — 2p) + log (1a is the single-letter quantum capacity for this channel 
3 (see also Ref. 0])- 

1 A Af A! h. 

Note that d A xp 1 " ™- fe is a projector of rank d A . This may be a well-known observation and is 
not difficult to prove but for the sake of completeness, we provide a proof in the appendix in Lemma [lOl 

A' A' 

Observe that p 1 " is the maximally mixed state. We note that the capacity-achieving input is maximally 
entangled. 



15 



For reasons that should be apparent from what follows, we also consider A-quasi relative entropy for A 
G [0,2] \{1} given by 



^(Plk) :=sign(A-l)TrpV- A . 



(105) 



This relative entropy is jointly convex in its arguments and satisfies monotonicty for the chosen range of A 
1 2811 . The Renyi divergence is a function of this quantity. 

For the hockey-stick divergence and A-quasi relative entropy, note the following identical steps 



2 n terms 

= E a ktn V(a ABi ^ Bi n-k\\l^ a Bi ^ Bi ^] 



2" terms 

C r ^ A A' ,,AI 

2" terms 



(106) 
(107) 
(108) 



where a follows from orthogonality of f's, 6 follows since we have removed the tensors with |e)(e|, and c 
follows from monotonicity. 

The quantity K,^ (A; B n ) can be upper bounded by T>(p ABn \ | 1 <g> p B ™ ). For the Renyi divergence of order 
is upper bounded by first computing (I1081 l for the A-quasi relative entropy to get 



£fo>(A; 5") < ^ log [p4~ A + (1 - p)^ 1 



(109) 



Using (18~T| ), we get 



F < exp 



A- 1 



-n < 



K 



log 



pd]f A + (i-pX 



A-l 



A - 1 



The function 

h(x) := log [pd l ^ x + (1 - 
satisfies /i(l) = 0. Furthermore, for p £ [0, 1/2], 

MA) 



■x-ll 
A J 



lim ■ 

xli A — 1 



Q{M). 



(110) 



(111) 



(112) 



Hence, for all TZ > Q(M), 3 A £ (1, 2] s.t. TZ - h(X)/(X - 1) > 0, and thus the strong converse holds. For 
p > 1/2, h'{\) < and hence, using similar arguments as above, the strong converse follows. 
For the hockey-stick divergence, (1108 l l is computed as 



lC^(A;B n )<J2(^j^ k>n Tr 



l AA 'l'" A 'n-k - 7I (g) p A 'y A 'n 



k=0 

n _ I log 7 I 
2 L21ogd A J 



< 



E 

k=0 



n 



(113) 



(114) 
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where we have upper bounded Tr(p AAl '" A "~ k — 7I <g> p 



by 1 for k < n/2 - [log 7/(2 log d A )\ . 



Let us choose log 7 = n[K+Q(J\f)] 12 in (TT08T> . For ft > Q(Af), we have n/2- [log 7/(2 log d A )\ < np. 
Using the Chernoff bound and (|93K we get 



F 



{Tt "1 ft 

--[K-Q(Af)]j+expi- — 

which gives us the strong converse. 



(2p-l)- 



+ 



n 



4 log d / 



(115) 
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Appendix A: Proof of Lemmas 

Lemma 4. Consider the matrices p,a > and a scalar 7 > 0. Then for any CPTP map £, 

Tr(p- 7 a) + >Tr[£(p)-j£(a)] + . (Al) 

Proof. Let the Jordan decomposition of p — ja = Q — S, where Q, S > 0. Let P := P{£( P )~-y£(a)>o}- Then 

Tr(p - 70-)+ = TrQ (A2) 
= Tr£(Q) (A3) 

> TrP[£(Q) - £(S)] (A4) 
= Tr[£(p)- 7 £(a)] + , (A5) 

where a follows since £ is trace preserving, b follows since we are subtracting non-negative terms. □ 

Lemma 5. Let f : M + — > M. be an operator monotone function. Then for p, a > 0, P := P{ p - a >o), we have 

TrPp[f(p)-f(a)]>0. (A6) 



Proof. We follow the arguments similar to Theorem 11.18 in Ref. Il47ll (see also the proof of Lemma 1 in 



Ref. 1481). Using the Lowner's Theorem (see Ref. [47]), 



Mi±_^ (A ), (A7) 
x + A 







where p is a positive finite measure, we get 

TrPp [f(p) - f{a)\ = A(l + A)Tr [Pp(p + A) _1 (p - a) (a + A)" 1 ] dp(\). (A8) 
J 

To prove the inequality, it is sufficient to show that TvPp(p + A)~ 1 (p — a)(a + A)~ x > V A > 0. For 
A = p — a, we can get the integral representation 



TrPp(p + A)~ 1 (p-cj)(cj + A)- 1 = [ ATr [Pp(a + tA + X)~ 1 A(a + tA + A) -1 ] dt. 

Jo 



(A9) 



Hence, it is sufficient to show that 

TrPp(a + tA + A) _1 A(<r + t A + A)" 1 > 0. (A10) 
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It is shown in Theorem 11.18 in Ref. Q that TrPa(a + tA + A) _1 A(cr + t A + A) -1 > 0. Now it is easy 



to 



see that TrPA(<r + t A + A)- 1 A(<7 + t A + A)" 1 = TrP [A(cr + i A + A)" 1 ] ^ °- Addin g these 



two 



quantities, we get (lAlOb and the result follows. In particular, since log is an operator monotone function, the 
claim implies that 



TrPp (log p — log a) > 0. 



Lemma 6. Let p, a > 0, P = P{ P --ya>o}> an d log 7 > 5(/j||<t), f/ien 



TrPp < 



Trp (log p - log a) -[S{p\\a) 
[\o gl - S{p\\a)f 



Proof. We can rewrite (1A 121 > as 



TrPp < 



Trp {log p - log( 7 a) + [log 7 - S(p| |(7)] IK 
[log 7 -5(p||a)] 2 



It suffices to show that 



Trp % TrPpjlogp - logfrtr) + [log 7 - S{p\\a)} 1} 
" P " [log 7 -5(p||ff)] 2 

_ TrPp [log p - log( 7 ff)] 2 TVPp [log p - log( 7 <r)] 



+ 



[log 7 -5(p||a)] 



+ TVPp, 



(All) 
□ 

(A12) 

(A13) 

(A 14) 
(A 15) 



[io g7 -s( P |H] 2 

where in a, the sufficient condition is due to multiplication by P. The first term is non-negative and the second 
term is non-negative using Lemma[5]and the inequality follows. □ 

Lemma 7 (Quantum Sibson identity). For any quantum state p AB in system AB and D\ to be the Renyi 
divergence of order A, we have 



D x {p AB \\t®a B )=D x {a*\\a B ) + 



A 



A- 1 



log TV 



Tta (p 



AB\ 



(A 16) 



where a* 



Tr A {p AB ) 



AB\ X 



TV 



Tr.4 (p AB Y 



Proof. For the classical Sibson identity, see Ref. [49]. Note that 

1 . / A D\ A 



D x (p AB \\l®a B ) 



log TV (O" [1®(0 ] 

AB\ X f B\l-A 



A- 1 

^logTrTV^ 



log Tr (a*) X {a B )^ x + — ^ log Tr 



A- 1 



A- 1 



A 



2M*1kfl) + ^— [log" 1 * 



(A17) 



(A18) 

(A19) 

(A20) 

(A21) 
□ 
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Lemma 8. IfTZ > max A a> I{A)B) a , then 

3 te [-1/2,0), such that V s G (t,0), - sK + mm E (s, M) p > 0. (A22) 

Proof. The proof follows the same argument as Lemma 3 in Ref. 0. Letg^/A 4 ') := -sK + E (s,N) p 
and suppose that 7Z > max aa' I(A)B) a . Note that V we have g(0, p AA ) = and 

g'(0, p AA> ) = -TZ + 1(A) B) a < 0. (A23) 
Now suppose that (IA22I ) does not hold. Then 

Vt G [-1/2,0), 3s £ 0,0), such that min g(s, p AA ') < 0. (A24) 

p AA' 

Hence, there exists a real sequence {s n } and a sequence jp^" 4 ' j C S(1~Laa') sucn that 

S ™ G ( rT» ) and 9(s n ,p AA ') <0- (A25) 

V n + 1 / 



Now since 5(^aa') is a compact set (see Ref. H50TI ">. there exists a subsequence of {p AA } that converges to 
some p^ as n — > oo. Without loss of generality we can assume that p AA — > p^ . From the mean value 
theorem, it follows that 

(n AA' \ i AA' \ 
Vn, 3 r n E (s n , 0), such that g'(r n , p AA ') = m ' Pn )-^s n ,p n I > . (A26) 

- s n 

Since (/(s, /r 4 " 4 ') is a continuous function of (s,p AA '), (IA26I ) yields ^'(O,^^ 4 ') > 0, which contradicts 
(TA231 . □ 

Lemma 9. Consider a cq-state p AB = Y^ x Px(x)\x)(x\ A tg> a B , where Px is a probability distribution and 
a B G S^Hb)- Then for all such cq-states, 

Tr/ B [\ogp AB - \og{p A ®p B )} 2 < g(\AB\)+g(\B\), (All) 
where for any deN, g(l) = 0, 

f 0.563, d = 2 

:= I log 2 rf, d>3. (A28) 

Proof. It is not difficult to see that for p A = Tr B p AB , p B = Tr A p AB , 

[log p AB - log(p A <g> p B )] 2 = log 2 p AB - log 2 p A ® 1 + 1 ® log 2 p B (A29) 

-2^1ogP x (x)|x)(x| A (8)log(jf (A30) 

-J2\x){x\ A ® (logtrf log/3 B + logp B logaf) . (A31) 
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Each term above with the negative sign contributes negatively when we take the trace and hence, neglecting 
these terms gives us an upper bound. We are left with 

Trp AB log 2 p AB + T?p B log 2 p B . (A32) 



Using the arguments in Appendix E in Ref. 12311 . it follows that for a quantum state a of dimension d, 
Trcrlog 2 a < g(d). QED. □ 

Lemma 10. Let \i/j) XYlY2 be a maximally entangled state, i.e., 

I^>™ = -T^m E E \^h) X \hi^ Y \ (A33) 

Vl y lll y 2| il=lia=1 

where \X\ > |Yi||Y" 2 |, {1*1*2}"^} flwd {1*1*2/} are an y orthonormal bases in 1~Lx and'Hy^ ® 'Hy 2 respec- 
tively. Then, | Y2I p^ yi w a projector where p XYl = TVy 2 ^ xyi ^ 2 . 



(A34) 



Proof. It is easy to see that 

, \Yi\ \Y 3 \ 

p X¥i = E E \^w^\ x ® ^ (M 2 ><^ 2 i yiy2 ) . 

1 lN 21 11,^=1*3,^=1 

To prove the claim, it suffices to show that (|Y" 2 | P XYl ) 2 = 1^2 1 p XYl or 
1 ini |Ka| 

Ho E E Tr ^ (IWO'iiaf 1 *) T*V a (|jii2>(jii 2 | yiy2 ) = Try 2 (|i 1 < 3 >0iJ3r iya ) ■ (A35) 

jl = l J2 = l 

Now consider the Schmidt decomposition of \iii2) YlY ' 2 , i.e., 

|n^ 2 ) yiY2 = E^^l^2> yi l^ 2 ) y2 . (A36) 

k 

Substituting in (IA35b and simplifying, we have 
LHS of 4A35J) = — — ^2 y/<Xkhi2 a l'ji i' % a ijxh Q'fi 32 1 Ijih)** Qjih I khi2) Y2 1 fc« 1*2) (^1^2 T 1 ( A37 ) 



I ill 



E v/^i^^'j'i^ (%2l ya I K7T E a ihi2\ l h32)^hh\ Y2 I 



fcii*2 



I**i«2>(6"ij2| yi (A38) 

= E ^ j ^ 1 y ' |k*i*2> y2 |A;ii«2> (61 J2 1 * (A39) 

k,V 

= RHS of ( fA331) . (A40) 
where in a, we have replaced the term inside the parenthesis by an identity matrix since 

4t E aihh\l3\32)(lji32\ Y2 = |-Srr E Tr n (I^X^I^ 2 ) = I- (A41) 

QED. □ 



Appendix B: Proof of Theorem [5] 
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Note that (1851) follows easily. We now verify (1861 ) using the following differentiation rule (Lemma 4 in 
Ref. 116]) for a Hermitian operator X(s) parametrized by a real parameter s 



§-TTg[X(s)}=Tr g '[X( S )}^ 



(Bl) 



Let the spectral decomposition of a AB be a AB = YU ^i\^)(^\ AB anc ^ a i = I^-aK) (*l • Hence, we get 

a B = Tr A a AB = £. V* and K i : = Ttx(o j4fl ) 1/(a+1) = Ei Aj /(s+1) <7i. It is easy to see using <ED> that 

i i 

dni/ds = -k 2 /(s + 1), where k 2 = Ei A / +1 l°g(\ f+1 ) cr »- 11 now follows that 

Tr«f («2 - «i logKi) 



9s 



Tr«; +1 



9s 



_ = Tr [ S Ai ( log A ^ - ( 2 AiCJi ) log ( A ^) 



H(B) a -H(A,B) a , 
I(A)B) a . 



(B2) 

(B3) 

(B4) 
(B5) 



Now we show that g(s) + (s + l) log \A\ is an increasing function in s. Consider the operators Ei = \J Oij\A\. 
Then = Yli ^A^)^]^ = 1. Since x 7 , 7 S (0,1] is operator concave, we have, using the 

operator Jensen's inequality and for 1/2 < a < (3 < 1, 7 = a//3, 



(B6) 



or 5 (a - 1) + a log \A\ < g(J3 - 1) + log |^|. 



